{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Copyright (c) Cornac Authors. All rights reserved.*\n",
    "\n",
    "*Licensed under the Apache 2.0 License.*\n",
    "\n",
    "# Hyperparameter Search for VAECF"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/PreferredAI/cornac/blob/master/tutorials/param_search_vaecf.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://github.com/PreferredAI/cornac/blob/master/tutorials/param_search_vaecf.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook describes how to perform hyperparameter searches in Cornac. As a running example, we consider the VAECF model and MovieLens 100K dataset. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import cornac\n",
    "from cornac.datasets import movielens\n",
    "from cornac.eval_methods import RatioSplit\n",
    "from cornac.hyperopt import Discrete, Continuous\n",
    "from cornac.hyperopt import GridSearch, RandomSearch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare an experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we load our data and instantiate the necessary objects for running an experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load MovieLens 100K ratings\n",
    "ml_100k = movielens.load_feedback(variant=\"100K\")\n",
    "\n",
    "# Define an evaluation method to split feedback into train, validation and test sets\n",
    "ratio_split = RatioSplit(data=ml_100k, test_size=0.1, val_size=0.1, seed=123, verbose=False)\n",
    "\n",
    "# Instantiate Recall@100 for evaluation\n",
    "rec100 = cornac.metrics.Recall(100)\n",
    "\n",
    "# Instantiate VACF with fixed hyperparameters\n",
    "vaecf = cornac.models.VAECF(k=20, autoencoder_structure=[40], learning_rate=0.005, n_epochs=100, seed=123, verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform searches for the *beta* parameter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume for now we are interested in determining a good value for the hyperparameter $\\beta$, the weight of the KL term in VAECF's objective:\n",
    "\n",
    "$$ \\mathcal{L}(\\theta,\\phi) = \\mathbb{E}_{q_{\\phi}(z|r)}[\\log{p_{\\theta}(r|z)}] - \\beta \\cdot KL(q_{\\phi}(z|r)||p(z)), $$\n",
    "\n",
    "where $z$ is the latent variable (user representation), and $r$ is the observed user-item feedback."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wrap vaecf into search methods\n",
    "All we need to do is to wrap our instantiated model ``vaecf`` into a search method and specify a search space for *beta*, a metric of interest, as well as the evaluation method. Cornac supports two types of searching methods, namely random search and grid search, we consider both of them here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# GridSearch\n",
    "gs_vaecf = GridSearch(\n",
    "    model=vaecf,\n",
    "    space=[\n",
    "        Discrete(\"beta\", np.linspace(0.0, 2.0, 11)),\n",
    "    ],\n",
    "    metric=rec100,\n",
    "    eval_method=ratio_split,\n",
    ")\n",
    "\n",
    "# RandomSearch\n",
    "rs_vaecf = RandomSearch(\n",
    "    model=vaecf,\n",
    "    space=[\n",
    "        Continuous(\"beta\", low=0.0, high=2.0),\n",
    "    ],\n",
    "    metric=rec100,\n",
    "    eval_method=ratio_split,\n",
    "    n_trails=20,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As evident from above, there are two types of parameter search domain, namely `Discrete` and `Continuous`. More details in the [documentation](https://cornac.readthedocs.io/en/latest/hyperopt.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run an experiment\n",
    "Next, we put everything into an experiment and run it. The results on the validation set, as well as the test results corresponding to the best value of *beta* found by each search method (as measured on the validation set in terms of Recall@100) will be displayed. One can print out more information during this step by setting `verbose=True` when instantiating `VAECF`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "VALIDATION:\n",
      "...\n",
      "                   | Recall@100 | Time (s)\n",
      "------------------ + ---------- + --------\n",
      "GridSearch_VAECF   |     0.5663 |   1.2433\n",
      "RandomSearch_VAECF |     0.5664 |   1.2834\n",
      "\n",
      "\n",
      "TEST:\n",
      "...\n",
      "                   | Recall@100 | Train (s) | Test (s)\n",
      "------------------ + ---------- + --------- + --------\n",
      "GridSearch_VAECF   |     0.5576 |  154.9834 |   1.6374\n",
      "RandomSearch_VAECF |     0.5567 |  273.1063 |   1.4689\n",
      "\n"
     ]
    }
   ],
   "source": [
    "cornac.Experiment(\n",
    "    eval_method=ratio_split,\n",
    "    models=[gs_vaecf, rs_vaecf],\n",
    "    metrics=[rec100],\n",
    "    user_based=False,\n",
    ").run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The best *beta* values found by our search methods are as follows,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Grid search: beta = 1.40\n",
      "Random search: beta = 1.46\n"
     ]
    }
   ],
   "source": [
    "print('Grid search: beta = {:.2f}'.format(gs_vaecf.best_params.get('beta')))\n",
    "print('Random search: beta = {:.2f}'.format(rs_vaecf.best_params.get('beta')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is also possible to access the best model through the attribute `best_model`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform searches for multiple parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also perform a joint search for multiple parameters. For instance, in addition to *beta*, lets include the number of epochs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "VALIDATION:\n",
      "...\n",
      "                   | Recall@100 | Time (s)\n",
      "------------------ + ---------- + --------\n",
      "GridSearch_VAECF   |     0.5737 |   0.9325\n",
      "RandomSearch_VAECF |     0.5786 |   0.9425\n",
      "\n",
      "\n",
      "TEST:\n",
      "...\n",
      "                   | Recall@100 | Train (s) | Test (s)\n",
      "------------------ + ---------- + --------- + --------\n",
      "GridSearch_VAECF   |     0.5666 |  129.2224 |   1.0327\n",
      "RandomSearch_VAECF |     0.5656 |  140.6678 |   0.9796\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# GridSearch\n",
    "gs_vaecf = GridSearch(\n",
    "    model=vaecf,\n",
    "    space=[\n",
    "        Discrete(\"n_epochs\", [20, 50, 100]),\n",
    "        Discrete(\"beta\", [0.0, 0.4, 0.8, 1.0, 1.4, 1.8, 2.0]),\n",
    "    ],\n",
    "    metric=rec100,\n",
    "    eval_method=ratio_split,\n",
    ")\n",
    "\n",
    "# RandomSearch\n",
    "rs_vaecf = RandomSearch(\n",
    "    model=vaecf,\n",
    "    space=[\n",
    "        Discrete(\"n_epochs\", [20, 50, 100]),\n",
    "        Continuous(\"beta\", low=0.0, high=2.0),\n",
    "    ],\n",
    "    metric=rec100,\n",
    "    eval_method=ratio_split,\n",
    "    n_trails=20,\n",
    ")\n",
    "\n",
    "# Put everything into an experiment and run it\n",
    "cornac.Experiment(\n",
    "    eval_method=ratio_split,\n",
    "    models=[gs_vaecf, rs_vaecf],\n",
    "    metrics=[rec100],\n",
    "    user_based=False,\n",
    ").run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The best hyperparameter settings are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Grid search:  {'beta': 1.0, 'n_epochs': 50}\n",
      "Random search:  {'beta': 0.8218487454180379, 'n_epochs': 50}\n"
     ]
    }
   ],
   "source": [
    "print('Grid search: ', gs_vaecf.best_params)\n",
    "print('Random search: ', rs_vaecf.best_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of this experiment is different from the previous one, due to a \"correlation\" between the effect of the considered parameters on training. Recall that *beta* can be thought of as a regularization coefficient controlling the effect of the KL term in VAECF. Previously, we fixed the number of epochs to 100, and more regularization seemed to be necessary to avoid overfitting (our searches selected higher beta values). However, the latter case reveals that we may achieve competitive performance with a smaller *beta* if we reduce the number of training iterations.   "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
